Underdog VLM: Moondream 3.0 with Only 2B Activated Parameters Surpasses GPT-5 and Claude 4
The preview version of Moondream 3.0 leads the revolution in vision language models with its lightweight and efficient mixture of experts architecture (total parameters 9B, activated only 2B). It demonstrates excellent performance in complex scenarios, surpassing mainstream models such as GPT-5, Gemini, and Claude 4 in multiple benchmark tests. Compared to the 2.0 version, which excelled in CAPTCHA recognition, the 3.0 version significantly expands its visual reasoning capabilities, drawing widespread attention from the AI community.